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ABSTRACT 

The  classical  reliability  model  fo,  N-modtilar  redundancy  (M'K)  assumes  the  network  to  be  failed 
when  a majority  oi  modules  which  drive  the  same  voter  fail.  It  has  long  been  known  that  this  model  is 
pessimistic  sit.  e there  are  instances,  termed  compensating  module  failures,  where  a majority  of  the 
modules  fail  hut  the  network  is  noniailed.  A different  module  reliability  model  based  on  lead  reliabil- 
ity is  proposed  which  has  the  classical  l^'R  reliability  model  as  a special  case.  Recent  results  from 
the  area  of  Lest  generation  are  employed  to  simplify  the  module  reliability  calculation  under  the  lead 
reliability  model,  first  a iault  equivalent  technique,  based  on  functional  equivalence  of  faults,  is 
developed  to  determine  the  effect  of  compensating  module  failures  on  system  reliability.  This  technique 
can  Increase  the  predicted  mission  time  ( the  time  the  system  is  to  operate  at  or  above  a given  reliabil- 
ity) by  at  least  40*  over  the  classical  model  prediction  for  simple  networks.  Since  the  fault  equiva- 
lent technique  is  too  complex  for  modeling  of  large  circuits  a second,  computational  simpler  teebniaue, 
based  on  fault  dominance,  is  derived.  It  is  then  snown  to  yield  esults  comparable  to  the  fault  equiva- 
lent technique.  A more  complex  example  circuit  analyzed  by  the  fault  dominance  model  shows  at  least  a 
75*  Improvement  in  mission  time  due  to  modeling  compensating  module  failures,  A commercially  available 
31  gate  integrated  circuit  chip  is  also  modeled  to  demonstrate  the  applicability  of  the  technique  to 
large  circuits. 

Ke"  Phrases:  Triple  modular  redundancy  (TMR)  , compensating  module  failures,  fault  equivalence,  fault 

dominance,  mission  time  improvement. 

INTRODUCTION 

New  system  designs  for  reliable  computers  mus*  he  explored  to  meet  the  increasing  demand  for  reli- 
able computing  systems.  One  Important  method  of  predicting  the  performance  of  a system  is  the  modeling 
of  the  system  reliability. 

Modeling  requires  a mathematical  or  physical  representation  which  incorporates  the  salient  para- 
meters of  the  modeled  system  [1],  A model  is  an  incomplete  representation  of  the  subject  under  study. 

To  be  of  value,  the  modeling  technique  must  be  convenient  to  apply  and  must  successfully  predict  the 
behavior  of  the  subject  under  various  parameter  changes.  If  a reliability  model  is  accurate,  then  in- 
sights can  be  gained  as  to  .how  the  system  reliability  changes  as  a function  of  the  design  parameters. 

An  exact  method  to  model  the  effect  of  a majority  of  failed  modules  in  the  N-modular  redundanev 
(fWR)  scheme  Is  presented  and  shown  to  increase  ti.e  predicted  mission  time  (the  time  for  which  the  sys- 
tem is  to  operate  at  or  above  a given  reliability)  over  that  of  the  classical  reliability  model  by  at 
least  40$.  Tills  exact  method  is  too  complex  to  apply  to  a large  circuit.  Thus  a second  and 


computationally  simpler  method  is  developed  and  shown  to  yield  a predicted  mission  time  within  10"*  oi  t!i 
exact  model  tor  example  systems. 

CLASSICAL  NMK  REL1AB  ILI  TV  MODi  t. 


NMK  > is  implemented  I v dividing  the  nonreduudant  network  into  modules,  replicating  the  modules 
N times  (where  S * 2t  + I and  t is  an  integer),  and  inserting  a majority  gate  Between  each  set  of  repli- 
cated modules.  figure  i depicts  the  implementation  of  a triple  modular  redundancy  (TMK)  version  of  a 
portion  of  a nonredundant  network  consisting  of  a two  input,  single  output  module.  TMK  will  be  the 
major  topic  of  discussion,  although  the  procedures  presented  have  straightforward  applications  to  the 
general  case  ot  NMK. 

Classically  the  reliability  of  the  network  in  figure  1 is  modeled  by  assigning  the  modules  a reli- 
ability function,  call  it  R <tl,  or  R with  time  as  an  understood  variable.  The  probability  of  module 

mm  J 

failure  is  thus  I - Km.  it  is  then  assumed  that  the  system  fails  when  two  >.  more  modules  driving  the 
same  voter,  say  voter  A in  figure  1,  tail.  The  classical  reliability  model  is: 


R^  + 3R2(1  - R ) 
mm  m 


(1) 


The  effect  of  nonperfect  voters  can  readily  be  incorporated  into  (1)  if  voters  are  assigned  to  mod- 
ule inputs  [3,4,5],  Since  each  voter  drives  exactly  one  module  input,  a voter  failure  has  the  same  ef- 
fect as  a module  failure.  li  is  the  voter  reliability,  then  the  effective  module  reliability  in  (1) 

2 

becomes  RyRm.  Networks  that  do  not  have  a voter  for  every  module  input  can  be  modeled  by  more  complex 
techniques  [6],  However,  ,oi  me  present  discussion  we  will  assume  every  module  input  lias  an  associated 
voter,  further,  the  discussion  will  center  on  applying  the  modeling  techniques  to  modules  only.  Non- 
perfect  voters  can  easily  be  modeled  by  applying  the  techniques  to  modules  and  their  associated  input 
voters. 

Equation  (1)  is  pessimistic  since  there  are  many  cases  that  a majority  of  the  modules  are  failed 
yet  the  network  of  Figure  1 would  not  be  failed,  for  example,  consider  two  failed  modules  for  the  net- 
work ot  Figure  1.  Assume  module  one  has  a permanent  logical  one  on  its  output  while  module  three  has  a 
permanent  logical  zero  output.  Tie  network  will  still  realize  its  designed  function.  Such  multiple 
module  failures  which  do  not  lead  to  network  failures  will  be  termed  compensatii  g modul  e failures. 

Adding  these  double,  and  even  triple,  module  failure  cases  can  often  lead  to  a substantially  higher 


predicted  reliability  than  the  classical  reliability  model.  With  a better  reliability  model  some  sys- 
tems previously  designed  may  be  found  to  be  overdesigned  for  their  specific  mission  because  an  inadequat 
reliability  model  was  used. 


Module  1 


Module  2 


Voter 


Module  3 


Figure  1.  Classical  triple-moduler  redundancy 
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MODELINC  COMPENSATING  FAILURES 

In  the  literature,  equation  (1)  is  sometimes  rewritten  to  take  into  account  the  cases  where  two 
modules  can  tail  so  as  to  have  compensating  effects  at  the  voter. 

K - R3  + 3R2(1  . R ) »■  k(3  R ) (1  - R )2  (2) 

m m m m m 

The  K in  (2)  Is  probability  formed  by  the  ratio  of  the  number  of  ways  in  which  compensating  failures 
can  occur  divided  by  the  number  of  ways  «*ny  failure  can  occur.  K has  often  been  taken  as  l/2  [7]. 

An  alternative  model  for  compensating  iailures  that  has  appeared  in  the  literature  [7]  is: 


ktmr 


R3  + 3R2 ( 1 
m m 


R ) 4 K3 
m m 


L. 

m*1 


m 

I 

n*  1 


n 

Z K 

-n  mnr 


P 

mnr 


(XT) 


m+n+r 


(3) 


where  the  module  failures  iollow  the  Poisson  assumption;  there  are  K ways  of  designating  which  of  the 

mnr 

three  modules  have  m,  n,  and  r failures  respectively;  and  P^^r  is  the  probability  that  the  system  oper- 
ates correctly  with  m failures  In  one  module,  n failures  in  another,  and  r failures  in  the  other. 
Equation  (3)  can  be  rewritten  as 


Rnx.D  “ R3  + 3R2(1  - R ) + R_  4 R 
TMR  m n m TWo  Three 


(4) 


where  R-p,ree  t^le  contribution  to  the  system  reliability  from  compensating  failures  in  two  and 

three  modules  respectively. 

Methods  for  calculating  K or  P are  not  described  in  the  literature.  The  next  two  sections  will 

mnr 

develop  a technique  to  calculate  R_  and  R_  based  on  a lead  failure  model  for  module  reliabili  tv.  The 

iwo  Three  J 

technique  can  also  be  used  to  calculate  the  Pmn>.  of  equation  (3),  if  some  assumptions  about  the  relation- 
ship between  the  lead  fa. lure  model  and  Poisson  failure  model  of  module  reliability  are  made. 

This  exact  technique  s only  practical  for  small  circuits.  A computationally  simpler  method,  em- 
ploying the  concepts  of  fault  equivalence  and  fault  dominance,  is  derived  for  determining  the  contribu- 
tion to  R^q  of  two  failed  modules  with  one  failure  in  each.  The  latter  method  is  validated  as  a good 
approximation  by  comparison  to  the  exact  technique. 

MODULE  FAILURE  MODEL 


In  order  to  calculate  R.^  and  R.^  we  will  have  to  define  what  we  mean  by  a module  failure.  Re- 
search in  the  area  of  testing  and  diagnosing  combinational  and  seque^Mal  logic  circuitry  has  relied 
heavily  on  the  logical  stuck -at-fault  mode  [8],  litis  model  assumes  that  most  or  all  failures  of  inter- 
est in  a logic  circuit  manifest  themselves  as  some  line  in  the  circuit  taking  on  a constant  logical 


I 

[ 
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vslue,  either  one  or  zero.  Now  that  algebraic  structure  which  applies  to  the  behavior  of  networks  in 
the  presence  oi  stuck-at  faults  has  been  developed  (8,11,12  , the  tools  are  available  to  formulate  and 
analyze  a new  module  reliability  model. 

The  new  model  will  asjign  a reliability  function  to  each  lead  in  the  network  rather  than  each  mod- 
ule as  in  the  classical  model.  Lead  reliability  will  be  represented  by  R and  the  probability  of  lead 

iailure  by  1 - R. 

Much  has  been  written  in  defense  of  the  stuck-at  failure  model  8 but  a few  words  will  now  be  de- 
voted to  justification  of  the  lead  reliability  model.  In  one  study  of  1C  failure  mechanisms  [9  it  was 

found  that  about  84^  of  the  1C  failures  were  directly  related  to  lead  failures,  either  of  input  leads  or 
of  metalization  on  the  chip  itself. 

Similar  to  the  classical  model  assumption  that  module  failures  are  statistically  independent  events, 
it  will  slso  be  assumed  that  lead  failures  are  statistically  independent.  A further  advantage  of  the 
lead  reliability  model  is  that  it  takes  into  account  the  increased  number  of  interconnections  required 
for  the  massive  redundancy  version  of  a nonredundant  system.  Wiring  errors  and  off-chip  interconnec- 
tions than  may  be  the  major  source  of  failures. 

It  will  be  assumed  that  the  stuck -at-one  (s-a-1)  faults  are  as  likely  to  occur  as  the  stuck-at-zero 
(s-a-0)  faults.  Hence,  the  probability  that  a lead  is  s-a-1  is: 


P (s-a-1)  • P (lead  failure)  P (s-n-1  | lead  failure) 
P (s-a-1)  - (1-R)  • l/2 


(5) 


FAULT  EQUIVALENT  RELIABILITY  MODEL 


Using  the  akove  model  for  module  failure  it  is  now  possible  to  calculate  Fi-st,  a few  assump- 

tions are  necessary.  The  modules  will  be  assumed  to  consist  of  irredundant  combinational  logic  [8]  so 
that  any  single  interna’  module  fault  will  cause  an  improper  output  for  at  least  one  set  of  inputs.  It 
will  also  be  assumed  that  the  system  has  failed  as  soon  as  it  is  possible  for  the  voter  to  give  a wrong 

response  to  any  possible  input  combination.  This  excludes  the  situations  where  a module  trio  fails  but 

subsequent  faults  within  the  module  trio  restores  proper  behavior. 

To  model  the  faulty  modules  we  will  adopt  the  notation  developed  in  [8].  We  will  now  illustrate 

the  evaluation  of  the  F , i.e.,  the  case  of  two  faulty  modules  for  a simple  module. 

1)  Transform  the  logical  circuit  into  the  corresponding  logical  model  [8], 

Consider  Figure  2(e)  where  the  module  under  study  is  a single  two  input  NAND  gate.  The  logical 
model  is  a directed  graph  shown  *n  Figure  2(b). 


2)  Form  the  functional  equivalence  classes  for  all  single  and  multiple  faults  in  the  logical 
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model  8 . 

laulc  is  said  to  tie  functionally  equivalent  to  another  fault  if  and  only  if  the  output  function 
realized  by  the  module  with  only  the  first  fault  present  is  equal  to  the  function  realized  when  only  the 
second  fault  is  present.  For  example,  a 0 and  c/l  (the  notation  l/ i means  line  l stuck  at  logical  value 
are  functionally  equivalent.  Table  1(a)  snows  the  fault  classes  and  their  members,  derc  ' is  the  null 

iault  and  represents  the  fault  free  network.  The  functional  equivalence  classes  are  assigned  numbers 
arb  i trari  lv . 

lor  a its  system,  two  faults,  f(  and  1 ^ , occurring  n different  modules  are  said  to  be  supplementary 
U their  simultaneous  presence  does  not  cause  network  failure.  Two  functionally  equivalent  fault  classes 

3re  taUed  aMEELejentary  classes  if  the  faults  contained  in  one  class  are  supplementary  to  fault  s in  the 
other. 


3)  hnumerate  the  supplementary  classes. 

for  the  case  of  two  module  failures  one  of  the  modules  will  be  the  fault  free  function. 

The  majority  gate  can  be  considered  to  be  a threshold  gate  with  input  weights  1 and  threshold  of  2 [10]. 

in  Table  1(b)  the  Karnaugh  maps  represent  the  fault  functions  for  the  faults  a/],  and  b/l  respec- 
tively. We  continue  to  try  all  possible  combinations  of  faulty  function  pairs  until  all  supplementary 
classes  are  formed.  These  are  shown  in  Table  1(c)  for  our  example. 

In  the  last  step  a matrix  E is  used  to  actually  evaluate  R^.  Element  Ej  j of  equivalence  class 
matri  is  the  number  of  faults  in  equivalence  class  j (the  equivalence  classes  were  assigned  numbers 
under  step  2)  which  are  a result  of  i leads  in  a module  failing,  where  i is  termed  the  fault  multiplicity 


4;  form  the  term  for  two  faulty  modules  by  use  of  the  equivalence  class  matrix  E and  the  equation: 


Two 


Ek-£,jj  K3P'k<lV 

V i , j such  that  (i,j)  is  a 
supplementary  class 


(6) 


where  p is  the  number  of  leads  in  a module  and  k is  the  sum  of  ue  line  failures  in  the  two  failed  module: 
The  development  of  step  4)  is  best  given  by  an  example.  The  equivalence  class  matrix  for  the  NAND 
gate  of  Figure  2 is  derived  from  the  entries  in  Table  1(a)  and  is  shown  in  Table  2. 

For  our  example  of  two  iailed  NAND  modules  (6)  becomes; 


[ (20/4)R7 ( 1 - R) 2 + (72/8)R6(l-R)3  + ( 1 1 8/l6) R5 ( 1 -R)4 
+ (96/32) R4 ( 7 -R)5  + (32/64) R3 ( 1 _ r ) 6 ] 


(7) 
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inbli  1.  The  a'  limit  ilasscs,  (h)  an  example  of  the  test  for 

supplementary  iault  classes,  and  c)  the  supplementary 
classes  lor  the  NANI)  pate  example  ol  Kip,  . 


Class 

rn  (>1 

C (a  1} 

T t? 

CF3  ,b,1) 

Cft  ,c/0: 

a 'l,c  /0; a /0,c/C;a  /l, b/l; 
b 1,  c/0;  b/0,  c /0; 

a A,  b/1,  c '0;a/l,  b/0, c/0; 
a ^C.b  /O, e /Q;a /0,  b/l,  c/0) 


Fault  function 

Maps 

X + Y 

y\*c  . 

0 

1 

jj 

Y 

y\xo  1 

0 

1 

LJ 

X 

y\XC  1 

0 

1 

1 

1 

0 y\xo  1 

o] — 

1 


Cr,  = { a /0;b/0;c/l ; 

a /l , c /l ; a /0 , c /l;  b /l , c /l ; 
b/0,c/l;a/l,  b '0; 
a /0,  b /0;  a/0,  b /l ; 
a /l , b/l , c /I ; a /l , b /0,  c /l ; 
a '0,  b/0,  c/l  ;a/0,  b/l, c/l) 


1 


vVo 


n 

1 1 


(a) 


*1 


y\XQ  1 v\X0  1 


c 

1 1 

0 

1 

1 

1 

1 

*2  #3 

a/l  b/l 

00 

( (2,3)  (?,5)  (3,5/ 

(3,2)  (5,2)  (5,3) 

00 


y\X0  1 V\X0  1 


0 

3 2 

0 

1 1 

1 

2 0 

1 

1 

Threshold  Voter 

Map  Output 

Func  ti on 


(fc.5)  ) 
(5,J») 
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Tahle  2.  The  equivalence  class  matrix  for  the  NAND  gate  of  Figure  2. 


Equivalence  Class 
2 t 


Number  of 
Failed  Leads 


0 

1 

2 

3 


- —•  U u M-u  *.  «...  of  th„.  „duU  ,.u„„  M 

; “,e"a*bu  w °th*'  “mpu  i"*  — — <««». .. - ... 

Og  ca  Stuck-at  fault  may  be  much  more  likely  than  the  other.  If  so,  P(s-a-i  | lead  failure)  coul 

^ “PPrOXlMCely  ’*°  ~ -ed  by  considering  s-a-i  Cype  fauU 

onl,  H,e  comparison  of  this  reliability  model  with  the  one  of  equation  (!)  «U  now  be  undertaken. 

COMPARISON  OF  FAULT  EQUIVALENT  AND  ( LASSICAL  RELIABILITY  MODELS 

If  there  are  p leads  1,.  a module,  then  the  module  reliability  R ,cro  a, 

. j , n reliability,  according  to  the  fault  equiva- 

lent  nodel  just  presented  is  r,,,. 

6 C 86  ° f6Wer  Ch8n  half  the  "°dules  failing  in  an  (HR  network 
the  classical  reliability  model  gives  a reliability  of: 


LN/2  j 

Lj 

i-0 


0 


V 


It  Will  now  be  shown  that  the  first  n/2  I + i , . 

: terr  s of  the  fault  equivalent  reliahflfi-  a i 

quivaient  reliability  model  are  ider 

tical  to  (8),  the  classical  NMR  reliability  model. 

f,uU  •’“lv-le“  ‘ — — - «>«  — ..  . 
fewer  module  failures.  1 

Proof:  The  probability  of  no  module  failures  is  (RP)N  . RN  , . . u 

( } Rm  WhlCh  18  the  fir“  term  of  (8).  Now  for  a, 

number  of  module  failures  less  char,  , , / 

. , „ " " ” lH/2J-  th«"  Of  „rk.„ 

th  “"flEU”,1“  * f,“'a  — «.».  SO  .0  ™pl«t. 

. pooo  Of  *.  a—.  ...  ..  „.o  .KO.  „ tb.t  a.  ,.u„.  <t„  , , lqo,  „ 

e Single  module  failure  probability  of  the  fault  equivalent  model. 

The  fault  equivalent  failure  probability  for  a single  module  is: 


S l/2k 
k-1 


k,i 

Vi 


RP-k(l-R)k 


(9) 
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The  E^  j term, 
leads  from  p. 
the  2^.  Hence 


considering  the  cases  for  all  i,  ls  2^)  since  there  are  (£)  ways  to  select  k failed 

Each  failed  lead  may  be  in  one  of  two  failure  modes,  s-a-1  or  s-a-0,  which  accounts  for 
(9)  becomes: 


t C w )( 1 -K)kRP_k  - -RP  + 1 

k-1  Vk 


(10) 


which  is  the  probability  of  module  failure  using  the  classical  reliability  model.  This  completes  the 
proof  of  Theorem  1 . 

The  classical  reliability  model  (1)  was  compared  to  the  fault  equivalent  model  for  the  single  NAND 
gate  module  of  Figure  2 and  a two  HAND  gate  module.  The  comparison  was  made  in  terms  of  mission  time 
improvement  I [7],  I is  the  ratio  of  the  time  at  which  two  reliability  models  h“ve  the  same  reliability 
To  obtain  I,  the  classical  reliability  at  time  t was  equated  to  the  new  reliability  at  time  It  and  the 
resultant  equation  solved  for  I.  Tne  mission  time  Improvement  for  the  fault  equivalent  model  (evaluated 
for  compensating  failures  in  only  two  modules)  over  the  classical  reliability  model  is  shown  in  Table  3. 
It  is  important  to  note  that  the  apparent  improvement  in  mission  time  is  not  due  f any  change  in  the 
modeled  hardware  system  but  rather  to  a more  accurate  reliability  model. 

Table  3.  Mission  time  improvement,  I,  of 
3 2 

D + ^ k (1-R  ) + R ] over 
m mm  Two 

_ 2 

[Rm  + 3 R^  (1-R^)]  for  two  simple  modules. 


0.7  0.8  0.9  0.95  0.99 

Single  NAND  gate  1.474  1.477  1.484  1 .491  1.496 

TVo  NAND  gate  1491  1.497  1.515  1526  1.539 


It  can  be  seen  that  a mission  time  improvement  of  50<  can  be  obtained  by  adding  the  effect  of  R 

TVo 

to  the  classical  reliability  model.  Another  way  of  looking  at  the  parameter  I is  that  if  the  classical 
model  is  used,  then  the  resultant  system  Is  overdesigned  by  50<  since  it  could  meet  its  mission  time 
specification  with  less  reliable  components. 

The  technique  outlined  above  for  the  fault  equivalent  reliability  model  can  also  be  employed  to 
determine  the  Pmnr  tens  of  equation  (3).  In  (3)  a module  is  assumed  to  follow  the  Poisson  distribution, 
e.g.  the  probability  that  there  are  exactly  n failures  in  a given  period  of  time  t is: 


R 

m 


In  (3)  a module  can  have  an  infinite  number  of  failu- es 


If  we  associate  a failure  in  a module  to  a lead 


-11- 


failure  and  assume  P “0  lor  m,  n,  r p.  where  p is  the  number  of  leads  per  module,  then  P Is 
mni  mno 

given  by. 


• L 
sucii 


nj 

that 


(i , J)  are  supplementary  classes 


(ID 


The  first  term  is  the  number  of  ways  two  modules  can  fail  with  m failures  in  ore  and  n in  the  other. 

Further,  each  lead  can  fail  in  one  of  two  ways  (s-a-0,  s-a-1).  The  second  term  of  (11)  is  the  number  of 

ways  two  modules  with  m and  n lead  fai'ures  can  form  compensating  failures.  The  modifications  of  (11) 

to  calculate  P are  obvious, 
mnr 

The  fault  equivalence  class  matrix  E is  the  computational  bottleneck  for  the  fault  equivalent  model. 
A significantly  simpler  model,  based  on  fault  dominance,  is  developed  in  the  next  section.  Subsequently 
it  will  be  compared  to  the  fault  equivalent  model  and  shown  to  be  a tight  lower  bound. 

FAULT  DOMINANCE  RELIABILITY  MODEL 

One  might  speculate  that  the  first  term  in  the  summations  of  equation  (3)  (m-1,  n-1 , r “ 0)  or 

equation  (6)  (k  ■ 2)  might  be  the  dominant  term  as  far  as  compensating  failures  are  concern’d.  That 

this  is  the  case  will  be  demonstrated  by  comparison  with  the  fault  equivalent  model  and  a multiple  fault 

model,  hence  we  shall  consider  the  case  of  two  nodule  failures  with  one  failure  per  module.  A simple 

formula  for  P,,„  and  R (with  one  lead  failure  in  each  of  two  modules)  assuming  tree  structured  mod- 
1 10  Two 

ules  will  now  be  derived.  Approximations  for  modules  with  reconvergent  fanout  will  also  be  given. 

The  modules  adhere  to  the  same  assumptions  as  the  fault  equivalent  reliability  .nuu>.l.  The  follow- 
ing definitions  will  be  required: 

Definition  1:  A test  set.  T , for  a fault  f ^ is  that  set  of  inputs  which  can  detect  fault  f ^ ; i.e., 

produces  a different  output  if  f^  occurs  than  when  no  fault  is  present. 

Definition  2:  Fault  f^  is  equivalent  to  fault  fj,  denoted  by  f^  « f^,  if  ani  only  if  T^  = T^,  where 

"e"  denotes  set  equivalence. 

Definition  3:  Fault  f2  is  said  to  dominate  I ] , denoted  by  f ^ iff  T2  3 ^ where  denotes  set  cover- 

ing. Conversely  f1  is  said  to  be  dominated  by  f2. 

The  necessary  and  sufficient  conditions  for  compensating  module  failures  can  now  be  given.  ^ 
stands  for  the  empty  set. 

Theorem  2:  Single  faults  f^  and  f2  are  supplementary  (written  f(  ~ i^)  if  and  only  if  T^  fl  T^  * 


Drool 


lari  I I i i,  T.,  =»  | then  t , ~ I . 

01  am  input  t.  e T,  , Fit,)  I'  - 1 where  I is  t lie  nonlaulty  output  and  Fit,)  is  the  output  with 
latilL  I,  present.  lint  Kl^l  1 = t)  or  '(f,)  “ F since  l.  does  not  detect  i . Pius  majority  (HI,), 

l u.-‘>  M '•  Similarlv  lor  d,,v  inP"'  £ T2,  t !_,»  I 1 and  HI,)  I-  = 0.  bo  majority  <F(I,), 

I'll  '■  r * r.  ller.ee  it  T,  ~ T,,  - ^ then  f,  ~ 1,. 

i’art  II:  ll  i,  ~ I then  T,  P T.,  = 

II le  pro.ll  is  hy  contradiction.  Assume  f,  ~ I,  Imt  I',  p T,  4 4> . lake  the  case  T,  ^ T„  - L _ 

Since  t.  is  a tost  lor  1,,  HI,)  F = I lo:  the  input  t,.  l.ikewise  t,  is  a te.  I lor  F,  and  |- ( i ., ) F=l. 

Pie  majoril  (Hi,),  I'il,),  H 4 F and  1,  /.  f.,.  Tlie  original  assumption  is  contradicted  and  ttierelore 
il  f,  ~ H then  1,  = ^>.  This  completes  the  prool . 

The  re  are  two  corollaries  to  Theorem  2 which  will  prove  helpiul. 

torollars  2,1:  II  f,  « l2  then  i,  /•  i2_ 

i 11  * | = 1 o ttl,'tl  T|  T2  by  t,le  derinition  oi  lault  equivalence.  Thus  T,  P T2  / ^ and  I , /■  f?. 

Corollary  2.2:  111,  f,  then  1,  i f2_ 

£l2£l:  II  l(  f_,  then  T,  1’2  by  the  definition  of  fault  dominance.  Further  T2  4 4>  since  the  module 

was  assumed  Irredundant  and  a fault  tor  which  there  is  no  detection  test  is  considered  redundant.  Hence 

T,  ' *2  4 ♦ and  f,  4.  1'2> 

Tlie  modules  will  also  be  assumed  to  be  composed  of  elementary  gates:  Invert,  AND,  OK,  NAND,  and 

NOR  are  depicted  in  figure  3.  rhe  equivalent  and  dominating  fault  class  structure,  as  developed  in  8, 
11,12  , is  also  depicted.  Consider  the  two  input  AND  gate  in  Figure  3.  In  the  graphical  representation 

oi  the  AND  gate  each  lead  is  represented  by  a pair  of  circles;  the  upper  circle  stands  for  a s-a-l  fault, 

the  cower  one  lor  a s-a-O.  Equivalent  faults  are  connected  by  a straight  line.  Ivults  related  by  dom- 
inance are  connected  by  an  arrow  pointing  from  the  dominating  fault  to  the  dominated  fault.  The  test 
sets  for  the  various  faults  are  aiso  given  in  Figure  3.  The  notation  Tfl/,  means  the  test  set  for  input 
a being  s-a-l.  Tlie  first  element  of  the  test  vector  corresponds  to  the  uppermost  lead,  etc.  Thus,  to 

delete  a s-a-l  In  the  two  input  AND  pate  a 0 should  be  applied  to  input  'a1  and  a 1 to  input  'b'. 

An  important  observation  to  make  from  Figure  3 is  that  lor  elementary  gates  the  test  sets  lor  all 
-nits,  other  than  equivalent  faults,  are  disjoint.  This  includes  faults  dominated  by  the  same  output 

fault.  A test  for  a circuit  can  only  put  one  value  on  each  lead  of  a circuit,  thus  the  circuit  lest  sets 

lor  two  faults  in  an  elementary  gate  must  be  disjoint  II  the  laults  are  not  equivalent. 
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The  elementary  gates  (a),  the  test  sets  (b) 
and  the  class  structure  (c) 


A tree  circuit  is  depicted  in  Figure  4(a)  and  Its  faun  class  structure  in  Figure  4(b).  Although 
AND  and  OR  gates  are  used  the  fault  class  structure  of  Figure  4(b)  is  representative  of  t lie  iault  class 
structure  of  any  arbitrary  tree  composed  of  elementary  gates. 

The  exact  number  oi  supplementary  faults  tor  a given  'ead  can  be  derived  with  the  aid  of  the  follow 
ing  definitions. 

he  f ln<  cion  4 A lead  y is  a successor  of  a lead  x iff  every  path  from  x to  the  circuit  output  also  pass 
es  through  y. 

Definition  5 A lead  x is  a predecessor  of  a lead  y iff  y is  a successor  of  x. 
in  Figure  4<a'  lead  y is  a successor  of  lead  x and  x is  a predecessor  of  y. 

Theorem  3 The  number  of  supplementary  single  faults  to  a failure  on  lead  x is  an  arbitrary  tree  cir- 
cuit composed  of  arbitrary  combinations  of  elementary  gates  is: 


p + p 


n 


pre 


n 


sue 


(12) 


where  p is  the  number  of  leads, 
successor  leads  of  x. 


n the  number  of  predecessor  leads  of  lead  x and  n the  number  of 
pre  r sue 


Proof:  For  a tree  with  p leads  the  fault  class  structure  will  always  consist  of  two  trees  of  p nodes 

each  regardless  of  what  combination  of  elementary  gates  are  used.  01  the  2p  single  faults,  anv  fault  in 

* 

one  tree  is  supplementary  with  any  fault  in  the  other  t:ee  since  their  test  sets  are  disjoint.  A lower 

bound  on  the  number  of  supplementary  faults  is  thus  p.  To  illustrate  this  consider  faults  f.  and  f in 

Figure  4.  If  w»  dene  e their  test  sets  by  T^  and  T^ , respectively,  it  is  easy  to  demonstrate  that 

T^  r i^  * p.  No  matter  what  the  relative  positions  of  f^  and  ffc  are  there  is  always  a fault  in  one  tree 

such  that  T^  ■ T^  U s (where  U denotes  set  unioo)and  a corresponding  fault  in  the  other  tree  such  tha- 

T,  • T,  U r.  Also  the  faults  f and  f,  will  be  in  the  same  elementary  gate  so  that  T r T,  ■ 4.  Thus 
no  gh  ' g h T 

(Tu  U o)  T (Tfc  U r)  • t 
(T4  n t6>  u <T4  r r)  u (o  ^T6)U(p^r)-4> 

This  can  be  the  empty  set  only  if  each  individua'  term  is  empty.  Hence  P T • ^>. 

Within  each  tree  of  the  fault  class  : "ucture  any  fault  on  lead  x will  either  dominate  or  be  equiva 

lent  to  any  predecessor  node  because  of  the  transitivity  of  the  dominance  and  equivalence  relationships 

[11,12].  Hence  there  can  b*  r.o  supplement*  v failures  on  the  n predecessor  leads  since  a test  set 

pre 

for  a fault  on  a predecessor  lead  w4 11  not  be  disjoint  from  the  test  set  for  a fault  on  lead  x.  Similar 

ly,  faults  on  successor  leads  to  x cai.rv  >e  supplementary  to  x since  they  will  either  dominate  or  be 
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A tree  circuit  is  depicted  in  Figure  4<a)  and  its  fault  class  structure  in  Figure  4(b).  Although 
a D and  OH  gates  are  used  the  fault  class  structure  of  Figure  4(h)  is  representative  of  t he  fault  class 
structure  of  anv  arbitrary  tree  composed  of  elementary  gates. 

The  exact  number  of  supplementary  faults  for  a given  lead  can  be  derived  with  the  aid  of  the  follow- 
ing  definitions. 


lief ini t ion  4 A lead  y is  a successor  of  a lead  x iff  every  path  from  x to  the  circuit  output  also  pass 
es  through  y. 

Definition  5 A lead  x is  a predecessor  of  a lead  y iff  y is  a successor  of  x. 
in  Figure  4(a)  lead  y is  a successor  of  lead  x and  x is  a predecessor  of  y. 

T.ieorem  3 The  number  of  supplementary  single  faults  to  a failure  on  lead  x is  an  arbitrary  tree  cir- 
cuit composed  of  arbitrary  combinations  of  elementary  gates  is: 


P 


P 


n 


pro 


n 


sue 


(12) 


where  p is  the  number  of  leads, 
successor  leads  of  x. 


n the  number  of  predecessor  leads  of  lead  x and  n the  number  of 
pre  sue 


Proof  For  a tree  with  p leads  the  fault  class  structure  will  always  consist  of  two  trees  of  p nodes 

each  regardless  of  what  combination  of  elementary  gates  are  used.  Of  the  2p  single  faults,  any  fault  in 

* 

one  tree  is  supplementary  with  any  fault  in  the  other  tree  sirce  their  test  sets  are  disjoint.  A lower 
bound  on  the  number  of  supplementary  faults  is  thus  p.  To  illustrate  t..is  consider  faults  f and  f in 

**  D 

F gure  4.  If  we  denote  their  test  sets  by  and  Tfe,  respectively,  it  is  easy  to  demonstrate  that 

T^  f'  Tfc  ■ e.  No  matter  what  the  relative  positions  of  f^  and  f(  are  there  is  always  a faul'  in  one  tree 

such  that  T ■ T,  L s (where  U denotes  set  union) and  a corresponding  fault  in  the  other  tree  such  that 

T ■ T,  L r.  Also  the  faults  f and  f will  be  in  the  same  elementary  gate  so  that  T r T,  « 6.  Thus 
n b g h g h 

<Ta  U o)  r (T6  r)  - 4> 

(T4  r V U (T4  ~ r)  (o  P V U (P  " r)  - * 


This  can  be  the  empty  set  only  if  each  individua,  term  is  empty.  Hence  0 Tfe  « 4>. 

Within  each  tree  of  the  fault  class  : meture  any  fault  on  lead  x will  either  dominate  or  be  equiva- 
lent to  any  predecessor  node  hecause  oi  the  transitivity  of  :ie  dominance  and  equivalence  relationships 

11,12],  Hence  there  can  be  r.o  supplement*  failures  on  the  n predecessor  leads  since  a test  set 

pre 

for  a fault  on  a predecessor  lead  will  not  he  disjoint  from  the  test  set  for  a fault  on  lead  y.  Similar- 


ly, faults  on  successor  leads  to  x cii  n-  >e  supplementary  to  x since  they  will  either  dominate  or  be 
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equivalent  to  it.  This  accounts  lor  the  negative  tenns  in  (12). 

All  that  remains  to  he  shown  is  tin  t each  remaining  lead  contributes  exactly  one  supplementary 
fault.  lor  any  given  successor  lead,  y,  one  tree  of  the  tault  class  structure  will  have  a fault  on  y 
dominating  a tault  on  x or  a lault  on  a successor  lead  of  x.  The  other  tree  wil'  have  an  equivalence 
relation  Rcier  to  Fig">-e  4 to  clarify  the  discussion.  Assume  lead  x is  represented  by  faults  1 , f 
and  lead  y by  f,,  16.  It  will  be  shown  that  faults  1 ( , 1 4,  f$  are  supplementary  to  I,  but  ffe,  i f 
are  not  supplementary  to  f^. 

The  t st  ;c's  lor  f?  and  1^  are  disjoint  hence  1^  ~ 1.^.  By  the  transitivity  of  equivalence  and 
dominance,  the  same  disjoint  relationship  holds  lor  the  test  sets  ol  P,  f Cj . Hence  f.,  is  supplementary 
to  all  the  faults  in  the  subtree  dominated  by  f By  similar  reasoning,  I l t because  test  sets  are 
not  disjoint.  Hence  fg  A 1?  and  1^  / f?.  By  the  transitivity  of  equivalence  and  dominance,  the  above 
reasoning  holds  for  all  successor;,  of  lead  x.  In  summary,  each  subtree  dominated  by  a fault  on  a suc- 
cessor of  lead  x contributes  one  supplementary  lault  per  lead.  Taken  over  all  successors  of  x the  re- 
sult is  one  supplementary  fault  per  non-successor  and  non-predecessor  lead  of  x.  This  completes  the 
proof. 

With  Theorem  3 the  numher  of  supplementary  single  faults  in  an  arbitrary  tree  can  be  deduced  by 
inspection.  A closed  form  formula  will  now  be  derived  for  a full,  uniform  tree. 

Theorem  4;  The  number  of  supplementary  faults  in  a full  tree  of  t-input  elementary  gates  of  I.  levels  is 


t-1 


U -1)  * t 


(13) 


From  the  first  term  of  (12)  ;ach  faulty  lead  is  supplementary  to  p other  lead  faults  in  the 
other  module.  Since  there  are  p leads  in  a module  and  nvo  types  of  failures  per  lead  the  first  term  in 

A . 1 


2 /+1 

13)  should  represent  Xp  . We  must  show  that  p - ~ ■'  . At  each  level  i of  the  tree  there  are  t‘ 

* J&H 


j t Xrt  | i - 

leads.  Thus  p - ^ t - — pp  . The  remaining  l-»rms  count  the  number  of  leads  that  are  not  predeces 
sors  or  successors  o.  a given  lead,  for  all  leads.  For  level  k there  are  tk  leads  which  are  not  prede- 
cessors or  successors  to  subtrees  consisting  of  i-k,  f-kt|,...£-l  levels.  ,evels  are  numbered  from  the 
module  output  to  input.  Thus  the  number  ol  supplemental y faults  is  augmented  by: 


- 1 / - 
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t-i 
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r t‘-i 
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which  is: 
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Equation  (14)  become:: 
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It  remains  to  be  shown  that 
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This  completes  the  proof  of  the  theorem. 

TT>e  subtree  counting  technique  employed  in  Theorem  4 can  be  used  to  calculate  the  number  of  supple- 
mentary faults,  S2  (for  a single  lead  failure  in  each  module)  for  an  arbitrary  tree.  Thus,  from  equa- 
tion (6) 

x2  „3p-2  ..  -.2 


R - 3.S,  • ( 1 /2 ) 2 P3P’2  (1-B)“ 
Two  2 


(W) 


Note  th.+  for  the  NAND  gate  of  Figure  2 £ - 1 and  (13)  yields  S2  - 20.  Thence  (17)  agrees  with  the  fir.t 
term  of  equation  (7).  Similar  correspondences  have  been  made  between  the  fault  equivalence  and  fault 
dominance  models  for  tree  structured  modules.  Table  4 depicts  the  mission  time  improvement  of  the  fault 
dominance  reliatlK.y  model  over  the  classical  reliability  model  for  various  circuits.  A four  level 
binary  tree  is  also  listed  in  Tabic  4 to  demonstrate  that  the  fault  dominance  model  can  lead  to 
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substanlial  mission  lime  improvement. 


Table  4.  Mission  tin.*:  improvemen',  I,  oi  the  iault  equivalence 
reliability  model  and  Vault  dominance  reliability 
model  over  the  classical  reliability  model  for  various 
modules. 


Single  NASD  gate 

R . ,, 

m J.  75 

0.8 

0.85  ' 

0.9 

0.95 

0.99 

| 

Equivalence  Model 

1 .4  76 

1.477 

1.481 

1 .484 

1 .491 

1 .496 

Dominance  Model 

1.338 

1 . 382 

1 .405 

1.439 

1 .472 

1 .491 

TVto  SAN'D  gate 

1 

| 

Equivalence  Model 

1 .404 

1.497  i 

l 

1.510 

1.515 

1.526 

1.539 

Dominance  Model 

1.335 

1.384 

1 .414 

1.452 

1 .492 

1.531 

Four  Level  Full 
Narv  Tree 

Dominance  vodel 

1.405 

1 .451 

1.305 

1.575 

1.663 

1.766 

Multiple  Fault  Model 

1.300 

1.318 

1.389 

1.361 

1.386 

1.408 

Dominance  plus  Multiple 

1.442 

1 .483 

1.535 

| 

1 1.598 

1 .692 

1.771 

In  order  to  Illustrate  that  equation  (13)  is  the  dominant  term  in  equation  (6)  a reliability  model 
employing  multiple  faults  can  be  developed  using  the  following  theorem: 


Theorem  3:  A lower  bound  on  the  number  of  supplementary  faults  for  N lead  failures  in  two  modules  of  a 

full  I level  tree  of  t-input  elementary  gates  is: 

i 

S-l  + 1 

S.N  ■ 2 


k-0 


N-  1 + 1 


(18) 


Proof : Equation  (18)  enumerates  the  subtree  multiple  faults  (SMF),  those  multiple  faults  whose  component 
faults  are  all  in  one  s '1  tree  with  one  fault  at  the  root  oi  the  subtree.  Since  the  fault  at  the  root 
masks  the  effects  of  other  faults  in  the  subtree  then  the  SMF  (subtree  multiple  fau.t)  will  be  supple- 
mentary to  any  other  SMF  in  the  other  module  so  long  as  the  root  faults  are  in  different  lault  class 

k £-k+1 

structure  trees.  At  level  k there  are  t subtrees  each  with  t branches.  Thus  there  are 

k A 1 |N 

t \ | j J SMF s where  (*)  is  defined  as  zero  if  y x or  v 0.  The  outer  sum  in  (18)  enumerates 

k-0  " y/ 

the  ways  N faults  can  be  distributed  between  two  modules  with  at  least  one  fault  per  module.  Finally, 


only  two  oi  the  N faults  have  specified  values  (those  at  the  roots  of  the  subtrees)  hence  there  are  2 


N-2 


values  of  s-a-1  or  s-a-0  the  multi  ile  Mult  can  assume  and  still  be  supplementary.  Finally,  there  are 
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two  wavs  the  root  faults  can  be  in  different  lault  class  structure  trees. 
2*2  factor  n (18)  and  completes  the  proof. 

Equation  ',17)  van  now  be  amended  to 


2p 


TV/o 


( 1/2) N 


R3P‘N(,. 


R) 


This  accounts  for  tl-j 


,n- 1 


(19) 


Equation  (19)  is  o lower  bound  since  there  are  multiple  supplementary  faults  which  do  not  have  all  of 
their  component  faults  restricted  to  a single  subtret  . In  Table  4 the  Multiple  fault  Model  is  compand 
to  the  Dominance  Model  for  a four  level  binary  tree  ior  % exact  (13),  5..  estimated  (18)  for  all  , and 
a combination  of  double  and  multiple  faults  (19).  From  the  comparison  we  can  see  mat  exact  is  the 
dominant  contributor  to  the  mission  time  improvement  and  that  multiple  faults  as  given  by  (18)  are  a 
second  order  effect. 

Equation  (13)  can  be  used  to  estimate  S,  for  arbitrary  trees  or  circuits  with  reconvergent  fan-out. 

An  upper-bound  on  S.,  for  an  arbitrary  tree  would  be  equation  (13)  with  the  maximum  depth  of  the  tree  sub- 
stituted for  l and  the  maximum  number  of  inputs  per  gate  for  t.  This  accounts  for  all  supplementary  faults 
in  the  circuit  and  for  some  that  do  not  exist.  A lower  bound  would  be  (13)  with  minimum  £ amd  minimum  t. 

The  fault  dominance  model  can  also  be  used  fee  circuits  with  rcconvergpr.L  fan-out.  The  circuit  is 
modeled  bv  an  identical  circuit  with  fan-out  points  removed.  Figure  5(a)  depicts  an  exclusive-OR  cir- 
cuit and  Figure  ..(b)  the  tree  circuit  used  to  calculate  the  supplementary  faults.  This  value  for  will 
be  a lower  bound  since  not  all  faults  are  considered.  All  supplementary  faults  in  the  reduced  circuit 
will  also  be  supplementary  in  the  original  circuit  since  fan-out  points  only  rescrict  the  values  of  test 
sets.  Gate  inputs  are  no  longer  independent.  Table  5 depicts  the  mission  time  improvement  for  ‘he  ex- 
clusive-OR and  a SN74147  1 0- 1 ine- to-4-1 ine  priority  encoder  chip.  The  latter  consisted  of  31  gates  and 
0 leads.  The  value  of  was  calculated  by  inspection.  The  number  oi  non-successor  and  non-predecessor 
leads  for  each  lead,  taken  over  all  leads,  took  less  than  10  m.nutes  to  determine  by  hand.  This  illus- 
trates the  applicability  of  the  technique  to  large  circuits. 


Table  5.  Mission  time  Improvement,  I,  of  thf  fault  dom'.iance 
reliability  model  over  the  classical  reliability 
model  for  modules  with  reconvergent  fan-out. 


0.75 

0.8 

0.85 

0.9 

0.95 

0.99 

exclusive-OR 

1.196 

1.207 

1.219 

1.232 

1 .240 

1 .259 

priority  encoder 

1.228 

1 .244 

1.263 

1.283 

1.304 

1 .324 

I 


Figure  5.  An  exclusive-OR  circuit  (a)  and  the  modeled  circuit  with  fan-out 
removed  (b). 
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rht 

I units  by 
equat  ion 


tault  dominance  model  can  also  be  used  to  calculate  P^  by  div 
the  total  number  of  single  faults.  To  find  the  maximum  value 
Hi  bv  2-— j nnd  take  the  limit  as  t approaches  infinity. 


iding  the  number  of  s'  pplementary 
°f  H)10  for  a full  tree  divide 
The  result  is: 


l/2  + 1/4  — 

l-t 


Hunce  I*,,0  is  between  1 2 and  3 \ for  a binary  tree.  It  can  be  shown  that  adding  levels  to  a tree  causes 
*110  t0  increase  while  adding  an  inverter  or  a new  gate  input  causes  I>1]()  to  decrease. 

CONCLUSION 

A technique,  the  fault  equivalence  reliability  model,  has  been  developed  and  shown  to  increase  mis- 

° f°r  S°ne  Slmpie  circuits*  A computationally  simpler  model,  the  fault  dominance  reli- 
ability model,  has  been  demonstrated  tc  * a good  approximation  to  the  fault  equivalence  model.  Both 
techniques  can  be  applied  to  calculate  the  P^'s  for  modules  employing  the  Poisson  failure  assumption. 
The  classical  reliability  model  may  be  sufficient  lor  establishing  the  better  of  two  different  fault 
tolerant  architectures  where  the  mission  time  improvement  may  be  on  the  order  of  5-20.  However,  in  fine 

tuning  of  the  design  and  predictions  of  the  reliability  of  the  final  system  a method  which  accounts  for 
comp  nsating  module  failures  should  be  used. 
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